Amazon Kinesis Data Firehose For S3 Destination Using CloudFormation
In this blog I have build an Amazon Kinesis Data Firehose Stream to deliver the data to Amazon Simple Storage (S3) using CloudFormation. Before the implementation of application lets understand what is Amazon Kinesis and Amazon Kinesis Data Firehose.
Amazon Kinesis
Amazon Kinesis is a service provided by Amazon which makes it easy to collect, process and analyse near real-time, streaming data at massive scale. Amazon Kinesis applications can be used to build dashboards, capture exceptions and generate alerts, drive recommendations, and make other near real-time business or operational decisions. Amazon Kinesis provides four types of Kinesis streaming data platforms.
- Amazon Kinesis Data Streams — To collect and process large streams of data records in near real time.
- Amazon Kinesis Data Firehose — To deliver near real-time streaming data to destinations such as Amazon S3, Redshift etc.
- Amazon Kineses Data Analytics — To process and analyze streaming data using standard SQL.
- Amazon Kinesis Video Streams — Fully manage services that use to stream live video from devices.
For more information, see Kinesis
Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose is a fully managed service provided by Amazon to deliver near-real-time streaming data to destinations provided by Amazon services. It load streaming data into data lakes, data stores, and analytics services. Amazon Kinesis Firehose supports four types of Amazon services as destinations.
- Amazon S3 — an easy to use object storage
- Amazon Redshift — petabyte-scale data warehouse
- Amazon Elasticsearch Service — open source search and analytics engine
- Splunk — operational intelligent tool for analyzing machine-generated data. For more information, see Amazon Kinesis Data Firehose
Amazon Kinesis Data Firehose Delivery Stream
The AWS::KinesisFirehose::DeliveryStream resource creates an Amazon Kinesis Data Firehose delivery stream that delivers near-real-time streaming data to an Amazon Simple Storage Service (Amazon S3), Amazon Redshift, or Amazon Elasticsearch Service (Amazon ES) destination. For more information , see aws-resource-kinesisfirehose-deliverystream
AWS CloudFormation
AWS CloudFormation simplifies resource provisioning and management for a wide range of AWS services. The CloudFormation service quickly and reliably provisions application architectures (or ‘stacks’) that you model in the CloudFormation template files. It is easy to update or replicate the stacks as needed. For more information , see AWS CloudFormation
Create CloudFormation Stack
- Go to the CloudFormation Management Console and click on the create Stack.
- Specify the template. Refer this CloudFormation template to create a Stack for Amazon Kinesis Firehose streaming to deliver the data to Amazon Simple Storage (S3). Upload this YAML template file. A template is a JSON or YAML file that describes your Stack's resources and properties. Click on Next.
YAML
AWSTemplateFormatVersion: 2010-09-09 Description: Stack for Firehose DeliveryStream S3 Destination. Resources: deliverystream: DependsOn: - deliveryPolicy Type: AWS::KinesisFirehose::DeliveryStream Properties: ExtendedS3DestinationConfiguration: BucketARN: !Join - '' - - 'arn:aws:s3:::' - !Ref s3bucket BufferingHints: IntervalInSeconds: 60 SizeInMBs: 50 CompressionFormat: UNCOMPRESSED Prefix: firehose/ RoleARN: !GetAtt deliveryRole.Arn ProcessingConfiguration: Enabled: true Processors: - Parameters: - ParameterName: LambdaArn ParameterValue: 'arn:aws:lambda:XX (Lambda Arn)' Type: Lambda s3bucket: Type: AWS::S3::Bucket Properties: VersioningConfiguration: Status: Enabled deliveryRole: Type: AWS::IAM::Role Properties: AssumeRolePolicyDocument: Version: 2012-10-17 Statement: - Sid: '' Effect: Allow Principal: Service: firehose.amazonaws.com Action: 'sts:AssumeRole' Condition: StringEquals: 'sts:ExternalId': 'XXXXXXXX(Your AWS AccountID)' deliveryPolicy: Type: AWS::IAM::Policy Properties: PolicyName: firehose_delivery_policy PolicyDocument: Version: 2012-10-17 Statement: - Effect: Allow Action: - 's3:AbortMultipartUpload' - 's3:GetBucketLocation' - 's3:GetObject' - 's3:ListBucket' - 's3:ListBucketMultipartUploads' - 's3:PutObject' Resource: - !Join - '' - - 'arn:aws:s3:::' - !Ref s3bucket - '*' - !Join - '' - - 'arn:aws:s3:::' - !Ref s3bucket - '*' Roles: - !Ref deliveryRole
3. Specify the Stack details. Give the Stack name of your choice. click on next. 4. Configure the Stack options. Mention the tags (its optional) and click next. 5. Review and Deploy the Stack.
Properties
DeliveryStreamName:
The name of the delivery stream.
DeliveryStreamType:
The delivery stream type. This can be of following values: DirectPut: Provider applications access the delivery stream directly. KinesisStreamAsSource: The delivery stream uses a Kinesis data stream as a source.
ExtendedS3DestinationConfiguration:
An Amazon S3 destination for the delivery stream. Conditional. You must specify only one destination configuration.
S3DestinationConfiguration
The S3DestinationConfiguration property type specifies an Amazon Simple Storage Service (Amazon S3) destination to which Amazon Kinesis Data Firehose (Kinesis Data Firehose) delivers data. Conditional. You must specify only one destination configuration.
Ref
When the logical ID of this resource is provided to the Ref intrinsic function, Ref returns the delivery stream name, such as mystack-deliverystream-1ABCD2EF3GHIJ.
Fn::GetAtt
Fn::GetAtt returns a value for a specified attribute of this type. The following are the available attributes and sample return values.
Resource
The Resources section of this file defines the resources to be provisioned in the stack.
BucketARN
The Amazon Resource Name (ARN) of the Amazon S3 bucket.
Tags
A tag is a key-value pair that you can define and assign to AWS resources. Tags are metadata. You can specify up to 50 tags when creating a delivery stream.
RoleARN
The Amazon Resource Name (ARN) of the AWS credentials.
Prefix
The YYYY/MM/DD/HH time format prefix is automatically used for delivered Amazon S3 files.
CompressionFormat
The compression format. If no value is specified, the default is UNCOMPRESSED. Allowed values: GZIP | HADOOP_SNAPPY | Snappy | UNCOMPRESSED | ZIP
ProcessingConfiguration
The data processing configuration for the Kinesis Data Firehose delivery stream.
BufferingHints
The buffering option.
Test The Delivery Stream
Now kinesis data firehose delivery stream has been created.
Let’s test the created delivery stream. For that click on the delivery stream and open Test with demo data node.
Click on Start sending demo data. This will start sending records to the delivery stream. After sending demo data click in Stop sending demo data to avoid further cost. It might take a few minutes for new objects to appear in your bucket, based on the buffering configuration of your bucket. Go to the destination S3 bucket and verify Whether the Streaming data has Uploaded in S3. Also check whether the streaming data does not have the Change attribute as well.
Amazon Kinesis Data Firehose Delivery Stream for S3 using Cloudformation has been created and tested successfully. Follow this Amazon Kinesis Data Firehose documentation if you want send the data to another destination create-destination-s3, aws-resource-kinesisfirehose-deliverystream